Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.
By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.
\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.
\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).
\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.
The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.
\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.
Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv
Source: https://github.com/pcm-dpc/COVID-19
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)Warnings
- 21/04/2020: dati Regione Lombardia parziali (casi testati non aggiornati)
- 20/04/2020: dati Regione Lombardia ricalcolati (ricalcolo di casi testati - eliminazione duplicati)
- 15/04/2020: dati Regione Friuli Venezia Giulia ricalcolati (ricalcolo di isolamento domiciliare e dimessi/guariti)
- 12/04/2020: dati P.A. Bolzano ricalcolati (ricalcolo dati guariti -110 rispetto a ieri).
- 10/04/2020: dati Regione Molise parziali (dato tamponi non aggiornato).
- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi
# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$totale_casi,
dy = reldiff(COVID19$totale_casi))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 16425.171226 1933.429462 8.495 0.0000000000104 ***
## th2 0.044178 0.002373 18.618 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18900 on 57 degrees of freedom
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 0.000007949mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 187092.4736 2432.3558 76.92 <2e-16 ***
## xmid 35.1068 0.3240 108.37 <2e-16 ***
## scal 7.8824 0.2191 35.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4127 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000005964mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 1000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 217031.4842256 1813.3913935 119.68 <2e-16 ***
## b2 8.7658675 0.1924948 45.54 <2e-16 ***
## b3 0.9348383 0.0008803 1061.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1474 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000002111richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "plinear",
control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging...
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 227764.649344 2386.274917 95.45 <2e-16 ***
## th2 0.057807 0.001103 52.42 <2e-16 ***
## th3 6.177901 0.176277 35.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1380 on 56 degrees of freedom
##
## Number of iterations to convergence: 2
## Achieved convergence tolerance: 0.05427
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3,
# data = data, start = start, trace = TRUE)models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -663.6594 | 3 | 0.9281639 | 1333.319 | 1333.755 | 1339.551 | |
| Logistic model | -573.3658 | 4 | 0.9967129 | 1154.732 | 1155.472 | 1163.042 | |
| Gompertz model | -512.6440 | 4 | 0.9995157 | 1033.288 | 1034.029 | 1041.598 | |
| Richards model | -508.7453 | 4 | 0.9995964 | 1025.491 | 1026.231 | 1033.801 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Infected", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 10000),
minor_breaks = seq(0, coef(mod2)[1], by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(100,NA)) +
labs(y = "Infected (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
minor_breaks = seq(0, max(ylim), by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 60 2020-04-23 232638 185007 285323
## 601 2020-04-23 179464 169478 187868
## 602 2020-04-23 186090 182265 189452
## 603 2020-04-23 187300 183964 190984
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$deceduti,
dy = reldiff(COVID19$deceduti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1553.376320 203.337810 7.639 0.000000000274 ***
## th2 0.050029 0.002589 19.324 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2366 on 57 degrees of freedom
##
## Number of iterations to convergence: 13
## Achieved convergence tolerance: 0.000003669mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 25241.1405 336.5398 75.00 <2e-16 ***
## xmid 37.8922 0.3022 125.38 <2e-16 ***
## scal 7.3436 0.1977 37.14 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 507.7 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000008698mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 10000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 29869.0792220 255.4479313 116.93 <2e-16 ***
## b2 12.1673828 0.2976476 40.88 <2e-16 ***
## b3 0.9320715 0.0008753 1064.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 173.7 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000001191richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 30957.264761 309.819074 99.92 <2e-16 ***
## th2 0.063332 0.001064 59.51 <2e-16 ***
## th3 9.266645 0.277183 33.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 164.2 on 56 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 0.000001949models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -541.0797 | 3 | 0.9386723 | 1088.1594 | 1088.5958 | 1094.3921 | |
| Logistic model | -449.7457 | 4 | 0.9972658 | 907.4914 | 908.2321 | 915.8015 | |
| Gompertz model | -386.4694 | 4 | 0.9996260 | 780.9388 | 781.6795 | 789.2489 | |
| Richards model | -383.1521 | 4 | 0.9996724 | 774.3043 | 775.0450 | 782.6144 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Deceased", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Deceased (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 60 2020-04-23 31255 24918 38120
## 601 2020-04-23 24056 22804 24952
## 602 2020-04-23 24981 24529 25360
## 603 2020-04-23 25102 24700 25531
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$dimessi_guariti,
dy = reldiff(COVID19$dimessi_guariti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1371.290549 122.164586 11.22 4.6e-16 ***
## th2 0.063610 0.001705 37.30 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2231 on 57 degrees of freedom
##
## Number of iterations to convergence: 11
## Achieved convergence tolerance: 0.000007994mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 73598.7145 2792.9998 26.35 <2e-16 ***
## xmid 50.3995 0.8027 62.79 <2e-16 ***
## scal 9.6391 0.2682 35.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 879.6 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000001736mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 164909.689430 10081.948384 16.36 <2e-16 ***
## b2 8.299957 0.149033 55.69 <2e-16 ***
## b3 0.966801 0.001104 876.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 466.4 on 56 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000007239richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, # algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 415547.191472 71871.639287 5.782 0.000000343288776 ***
## th2 0.015105 0.001568 9.635 0.000000000000173 ***
## th3 3.882293 0.152363 25.481 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 379.7 on 56 degrees of freedom
##
## Number of iterations to convergence: 20
## Achieved convergence tolerance: 0.000005402models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -537.5939 | 3 | 0.9856972 | 1081.1878 | 1081.6241 | 1087.4204 | |
| Logistic model | -482.1689 | 4 | 0.9976262 | 972.3378 | 973.0785 | 980.6479 | |
| Gompertz model | -444.7400 | 4 | 0.9992496 | 897.4800 | 898.2208 | 905.7902 | |
| Richards model | -432.5988 | 4 | 0.9994851 | 873.1977 | 873.9384 | 881.5078 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Recovered", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Recovered (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 60 2020-04-23 62323 55863 68880
## 601 2020-04-23 53747 51110 55928
## 602 2020-04-23 55186 53989 56255
## 603 2020-04-23 55716 54804 56582
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
positives = c(NA, diff(COVID19$totale_casi)),
swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )ggplot(df, aes(x = date)) +
geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
geom_line(aes(y = swabs, color = "swabs")) +
geom_point(aes(y = positives, color = "positives"), pch = 0) +
geom_line(aes(y = positives, color = "positives")) +
labs(x = "", y = "Number of cases", color = "") +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = palette()[c(2,1)]) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = y)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col=palette()[4]) +
geom_line(size = 0.5, col=palette()[4]) +
labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
scale_y_continuous(labels = scales::percent_format(),
breaks = seq(0, 0.5, by = 0.05)) +
coord_cartesian(ylim = c(0,max(df$y, na.rm = TRUE))) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1ggplot(df, aes(x = date, y = hospital)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "orange") +
geom_line(size = 0.5, col = "orange") +
labs(x = "", y = "Change hospitalized patients") +
coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$hospital, na.rm = TRUE),
max(df$hospital, na.rm = TRUE),
by = 100)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = icu)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "red2") +
geom_line(size = 0.5, col = "red2") +
labs(x = "", y = "Change ICU patients") +
coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$icu, na.rm = TRUE),
max(df$icu, na.rm = TRUE),
by = 10)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))